For the inaugural post on this new project, let's take a crack at learning constrained optimization using Lagrange multipliers.
The goal here isn't necessarily to teach others how to use this technique since there are plenty of other resources available. But rather, to demonstrate the use of the technique in a clean and communicative manner simply as an exercise for myself.
Today, we're going to use Lagrange multipliers to analytically find the extrema of a two-dimensional function, and then confirm our findings numerically.
Technique focus:
First, let's introduce the function we'll be investigating today. This example comes straight from the example section of the Wikipedia page on Lagrange multipliers.
$$ \begin{equation} \label{eq:f} f(x,y) = x^2 y \end{equation} $$Next, we define a constraint on $f$,
$$ \begin{equation} \label{eq:g} g(x,y) = x^2 + y^2 - 3 = 0 \end{equation} $$In English, our goal is to find the high and low points of $f$ along a circle that is centered on the origin and with a radius of $\sqrt{3}$.
As we see in eq. \eqref{eq:f}
First, let's form the Lagrangian,
$$ \begin{equation} L(x, y, \lambda) = f(x,y) + \lambda g(x,y) \end{equation} $$And just to make things clear, let's expand it out, substituting in the equations for $f$ and $g$. We'll also distribute the $\lambda$.
$$ \begin{equation} L(x, y, \lambda) = x^2 y + \lambda x^2 + \lambda y^2 - 3\lambda \end{equation} $$To find the extrema, we need to find values for our three unknowns $\left( x, y, \lambda \right)$. These three equations come from setting equal to zero the three independent partial derivatives of $L$ with respect to each of these variables. $$ \begin{align} \frac{\partial{L}}{\partial{x}} &= 2xy + 2 \lambda x = 0 \label{eq:L_by_x} \\ \frac{\partial{L}}{\partial{y}} &= x^2 + 2 \lambda y = 0 \label{eq:L_by_y} \\ \frac{\partial{L}}{\partial{\lambda}} &= x^2 + y^2 - 3 = 0 \label{eq:L_by_lambda} \end{align} $$
Starting with \eqref{eq:L_by_x},
$$ \begin{align} 2xy + 2 \lambda y &= 0 \nonumber \\ 2x \left( y + \lambda \right) &= 0 \nonumber \\ x \left( y + \lambda \right) &= 0 \end{align} $$This tells us that either $x = 0$ or $y = -\lambda$.
First we'll plug $x = 0$ into $\eqref{eq:L_by_lambda}$, $$ \begin{align} x^2 + y^2 - 3 &= 0 \nonumber \\ y^2 - 3 &= 0 \nonumber \\ y^2 &= 3 \nonumber \\ y &= \pm \sqrt{3} \end{align} $$
That gets us $x$ and $y$, but since we're also interested in $\lambda$ we'll next plug $x=0$ and $y = \pm \sqrt{3}$ into \eqref{eq:L_by_y}, $$ \begin{align} x^2 + 2\lambda y &= 0 \nonumber \\ 2\left(\pm\sqrt{3}\right) \lambda &= 0 \nonumber \\ \lambda &= 0 \end{align} $$
This gets us our first set of critical points: $$ \boxed{ \left( 0, \sqrt{3}, 0 \right) \\ \left( 0, -\sqrt{3}, 0 \right) } $$
Next, if $y = -\lambda$ then $\lambda = -y$. Plugging that into $\eqref{eq:L_by_y}$,
$$ \begin{align} x^2 + 2\lambda y &= 0 \nonumber \\ x^2 - 2 y^2 &= 0 \nonumber \\ x^2 &= 2 y^2 \end{align} $$And then, plugging $x^2 = 2 y^2$ into $\eqref{eq:L_by_lambda}$, $$ \begin{align} x^2 + y^2 - 3 &= 0 \nonumber \\ 2y^2 + y^2 - 3 &= 0 \nonumber \\ 3y^2 - 3 &= 0 \nonumber \\ 3y^2 &= 3 \nonumber \\ y^2 &= 1 \nonumber \\ y &= \pm \sqrt{1} = \pm 1 \end{align} $$
And since $\lambda = -y$, $\lambda = \pm 1$, though we must take care to always use the opposite sign as $y$. The result is that $\lambda y = -1$.
Plugging $y = 1$ and $\lambda = -1$, or $y = -1$ and $\lambda = 1$ into $\eqref{eq:L_by_y}$, $$ \begin{align} x^2 + 2 \lambda y &= 0 \nonumber \\ x^2 &= -2 \lambda y \nonumber \\ x^2 &= 2 \nonumber \\ x &= \pm \sqrt{2} \end{align} $$
Now we can take $x = \pm \sqrt{2}$, $y = \pm 1$, and $\lambda = \pm 1$ we get our remaining critical points. Including the first two we found, all our critical points are:
$$ \boxed{ \left( 0, \sqrt{3}, 0 \right) \\ \left( 0, -\sqrt{3}, 0 \right) \\ \left( \sqrt{2}, -1, 1 \right) \\ \left( -\sqrt{2}, -1, 1 \right) \\ \left( -\sqrt{2}, 1, -1 \right) \\ \left( \sqrt{2}, 1, -1 \right) } $$Next, we evaluate $\eqref{eq:f}$ with the $x$ and $y$ values from the critical points.
$$ \begin{align} f \left(0, \sqrt{3} \right) &= 0 \\ f \left(0, -\sqrt{3} \right) &= 0 \\ f \left(\sqrt{2}, -1 \right) &= -2 \\ f \left(-\sqrt{2}, 1 \right) &= 2 \\ f \left(-\sqrt{2}, -1 \right) &= -2 \\ f \left(\sqrt{2}, 1 \right) &= 2 \end{align} $$So we find maxima at $\left( \sqrt{2}, 1 \right)$ and $\left( -\sqrt{2}, 1 \right)$ and minima at $\left( \sqrt{2}, -1 \right)$ and $\left( -\sqrt{2}, -1 \right)$.
Now let's confirm that numerically.
# import the basics
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
First, let's define the function we're exploring.
def f(x, y):
return x**2 * y
Next, calculate some points along the circle that represents our constraint $g$. We can evaluate $f$ for each point along this circle and inspect the extrema.
circle_radius = np.sqrt(3)
# build the circle
theta = np.arange(0, 2*np.pi, 0.01)
g_x = circle_radius * np.cos(theta)
g_y = circle_radius * np.sin(theta)
# calculate the value of f at each point along our constraint
f_on_g = f(g_x, g_y)
f_on_g
contains the value of $f$ when evaluated at each point we've calculated along our constraint circle $g$, starting at theta = 0
and moving around the circle counterclockwise.
Next, let's visualize $f$ with a 3D mesh just so we can take a look at it and get some feel for what we're working with.
# make a mesh from -2 to 2 with 0.01-sized steps
mesh_x = np.arange(-2.0, 2.0, 0.01)
mesh_y = np.arange(-2.0, 2.0, 0.01)
mesh_x, mesh_y = np.meshgrid(mesh_x, mesh_y)
# run the mesh through f()
z = f(mesh_x, mesh_y)
# plot the result
fig = plt.figure(figsize=(16,8))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(mesh_x, mesh_y, z, cmap=cm.viridis)
# now, let's add our constraint to it. unfortunately, matplotlib 3d plots
# behave strangely, with the mesh blocking the points even if the points
# are in front of the mesh. so we'll float the circle above the mesh a bit
# just so we can see what region we're talking about. i'll also only
# show every 10th element for clarity.
ax.scatter(g_x[::10], g_y[::10], f_on_g[::10]+2.5, color='black')
plt.show()
Just eyeballing that plot, the results we found analytically seem plausible, but it's difficult to be sure. Let's try another method.
We can make a simple plot of the value of $f$ along $g$ by unwinding the circle and using theta
(converted to degrees) as the x-axis.
fig = plt.figure(figsize=(16,8))
ax = fig.add_subplot()
theta_deg = theta * (180/np.pi)
ax.plot(theta_deg, f_on_g, color='#888')
ax.hlines(0, 0, 360)
plt.show()
As we'd expect from our earlier analysis, we see two global maxima and two global minima. But we also see a local minimum and local maximum we'll have to ingestigate.
To improve the visualization, let's add in the critical points that we found earlier, along with their $\lambda$ values. This will allow us to confirm that they line up with the observed extrema and might give us a clue as to the non-global maximum/minimum around 90° and 270°.
crit_points = np.asarray([
# x y lambda
( np.sqrt(2), -1, 1),
(-np.sqrt(2), -1, 1),
( np.sqrt(2), 1, -1),
(-np.sqrt(2), 1, -1),
(0, np.sqrt(3), 0),
(0, -np.sqrt(3), 0),
])
# first, we have to find the values of theta that correspond to the
# x and y of each critical point
lambda_thetas_deg = (180/np.pi)*(np.arctan2(crit_points[:,1], crit_points[:,0]) + np.pi)
fig = plt.figure(figsize=(16,8))
ax = fig.add_subplot()
ax.plot(theta_deg, f_on_g, color='#888')
ax.hlines(0, 0, 360)
for point in crit_points:
point_theta = (180/np.pi)*np.arctan2(point[1], point[0])
if (point_theta < 0):
point_theta += 360
f_value = f(point[0], point[1])
ax.vlines( point_theta, min(0, f_value), max(f_value, 0), color='red' )
ax.plot(
point_theta,
0,
marker='o',
markersize=10,
markerfacecolor='white',
markeredgecolor='red'
)
ax.plot(
point_theta,
f_value,
marker='o',
markersize=10,
markerfacecolor='white',
markeredgecolor='red'
)
ax.text(
point_theta + 10,
f_value,
"f={}\nx={}\ny={}\n$\lambda$={}".format(
np.round(f_value, 3),
np.round(point[0], 3),
np.round(point[1], 3),
np.round(point[2], 3)
),
fontsize=14,
bbox=dict(boxstyle="round4,pad=.5", fc="0.8"),
)
plt.show()
Comparing the above plot to our predictions that expect maxima at $\left( \sqrt{2}, 1 \right)$ and $\left( -\sqrt{2}, 1 \right)$ and minima at $\left( \sqrt{2}, -1 \right)$ and $\left( -\sqrt{2}, -1 \right)$, we see that the values we arrived at analytically agree with the values we arrived at numerically.
Additionally, the plot suggesets that the sign of $\lambda$ indicates if a point is a local maxima or minima, with negative values of $\lambda$ corresponding to local maxima, positive values of $\lambda$ corresponding to local minima, and $\lambda = 0$ corresponding to saddle points. But that will be a topic for another post.